82 research outputs found
Soliton generation in CaF crystalline whispering gallery mode resonators with negative thermal-optical effects
Calcium fluoride (CaF) crystalline whispering gallery mode resonators
(WGMRs) exhibit ultrahigh intrinsic quality factors and a low power anomalous
dispersion in the communication and mid-infrared bands, making them attractive
platforms for microresonator-based comb generation. However, their unique
negative thermo-optic effects pose challenges when achieving thermal
equilibrium. To our knowledge, our experiments serve as the first demonstration
of soliton microcombs in Q > 109 CaF WGMRs. We observed soliton
mode-locking and bidirectional switching of soliton numbers caused by the
negative thermo-optic effects. Additionally, various soliton formation dynamics
are shown, including breathing and vibrational solitons, which can be
attributed to thermo-photomechanical oscillations. Thus, our results enrich the
soliton generation platform and provide a reference for generating solitons
from WGMRs that comprise other materials with negative thermo-optic effects. In
the future, the ultrahigh quality factor of CaF crystal cavities may enable
the generation of sub-milliwatt-level broad-spectrum soliton combs.Comment: 4 pages,5 pictures,description of soliton generation in a calcium
fluoride whisper gallery mode microresonators with negative thermo-optical
effect,ready for publication in optics lette
Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning
We revisit the estimation bias in policy gradients for the discounted
episodic Markov decision process (MDP) from Deep Reinforcement Learning (DRL)
perspective. The objective is formulated theoretically as the expected returns
discounted over the time horizon. One of the major policy gradient biases is
the state distribution shift: the state distribution used to estimate the
gradients differs from the theoretical formulation in that it does not take
into account the discount factor. Existing discussion of the influence of this
bias was limited to the tabular and softmax cases in the literature. Therefore,
in this paper, we extend it to the DRL setting where the policy is
parameterized and demonstrate how this bias can lead to suboptimal policies
theoretically. We then discuss why the empirically inaccurate implementations
with shifted state distribution can still be effective. We show that, despite
such state distribution shift, the policy gradient estimation bias can be
reduced in the following three ways: 1) a small learning rate; 2) an
adaptive-learning-rate-based optimizer; and 3) KL regularization. Specifically,
we show that a smaller learning rate, or, an adaptive learning rate, such as
that used by Adam and RSMProp optimizers, makes the policy optimization robust
to the bias. We further draw connections between optimizers and the
optimization regularization to show that both the KL and the reverse KL
regularization can significantly rectify this bias. Moreover, we provide
extensive experiments on continuous control tasks to support our analysis. Our
paper sheds light on how successful PG algorithms optimize policies in the DRL
setting, and contributes insights into the practical issues in DRL.Comment: 12 pages, 9 figure
Towards a compact soliton microcomb fully referenced on atomic reference
A fully stabilized soliton microcomb is critical for many applications of
optical frequency comb based on microresonators. However, the current
approaches for full frequency stabilization require either external
acousto-optic or electro-optic devices or auxiliary lasers and multiple
phase-locked loops, which compromises the convenience of the system. This study
explores a compact atomic referenced fully stabilized soliton microcomb that
directly uses a rubidium atomic optical frequency reference as the pump source,
and complements the repetition rate (7.3 GHz) of the soliton microcomb was
phase-locked to an atomic-clock-stabilized radio frequency (RF) reference by
mechanically tuning the resonance of the optical resonator. The results
demonstrate that the stability of the comb line (0.66 THz away from the pump
line) is consistent with that of the Rb87 optical reference, attaining a level
of approximately 4 Hz @100 s, corresponding to the frequency stability of 2E-14
@100 s. Furthermore,the frequency reproducibility of the comb line was
evaluated over six days and it was discovered that the standard deviation (SD)
of the frequency of the comb line is 10 kHz, resulting in a corresponding
absolute deviation uncertainty of 1.3E-10, which is technically limited by the
locking range of the soliton repetition rate. The proposed method gives a
low-power and compact solution for fully stabilized soliton micorcombs.Comment: 6 pages, 5 figure
Aortic valve morphology and paravalvular leak regression after a self-expandable transcatheter aortic valve replacement
Aims: The study aimed to compare paravalvular leak (PVL) changes after a transcatheter aortic valve replacement (TAVR) with self-expandable prosthesis between different aortic valve morphologies and evaluate the impact of paravalvular leak regression on clinical prognosis.Methods: Patients with aortic stenosis (AS) successfully treated with a self-expandable TAVR who were followed up for at least 1 year at our centre were consecutively enrolled from January 2016 to August 2019. Paired serial changes in paravalvular leak and other haemodynamic parameters by echocardiography were collected and compared between the bicuspid valve (BAV) and tricuspid aortic valve (TAV). A logistic regression model was used to explore the predictors of paravalvular leak regression (<1 grade) 1 year after transcatheter aortic valve replacement, while its impact on subsequent clinical outcomes (all-cause mortality and rehospitalisation for heart failure (HF)) was further evaluated using Kaplan–Meier analysis.Results: A total of 153 bicuspid valve and 114 tricuspid aortic valve patients were finally enrolled; haemodynamic parameters and paravalvular leak severity were comparable before the discharge between the two groups. The peak transaortic velocity, mean transvalvular gradient, and effective orifice area all significantly improved (p < 0.05) without intergroup differences at all follow-up timepoints. Significant paravalvular leak reduction was observed only in the TAV group (1.75% vs. 4.39%, p = 0.029), while moderate paravalular leak was still more prevalent in BAV (7.19% vs. 1.75%, p = 0.041) at the 1-year follow-up. Multivariable analyses identified the bicuspid valve, asymmetric calcification, and undersizing as independent predictors of failure of the 1-year paravalvular leak reduction in patients with mild or moderate paravalvular leak after discharge. Patients without a paravalvular leak reduction within 1 year showed a relatively higher 2-year all-cause mortality and HF (HR: 5.994, 95% CI: 1.691–21.240, and p = 0.053) rates thereafter.Conclusion: In AS patients after self-expandable transcatheter aortic valve replacement, paravalvular leak regression within 1 year was less prevalent in bicuspid valve morphology. The failure of paravalvular leak reduction might lead to an increased risk of poorer prognosis in the long run
Mastering Complex Control in MOBA Games with Deep Reinforcement Learning
We study the reinforcement learning problem of complex action control in the
Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far
more complicated state and action spaces than those of traditional 1v1 games,
such as Go and Atari series, which makes it very difficult to search any
policies with human-level performance. In this paper, we present a deep
reinforcement learning framework to tackle this problem from the perspectives
of both system and algorithm. Our system is of low coupling and high
scalability, which enables efficient explorations at large scale. Our algorithm
includes several novel strategies, including control dependency decoupling,
action mask, target attention, and dual-clip PPO, with which our proposed
actor-critic network can be effectively trained in our system. Tested on the
MOBA game Honor of Kings, our AI agent, called Tencent Solo, can defeat top
professional human players in full 1v1 games.Comment: AAAI 202
- …